Goto

Collaborating Authors

 worst-case switching regret


Reviews: Equipping Experts/Bandits with Long-term Memory

Neural Information Processing Systems

The main novelty in the full-information part of the results is the black-box reduction to the confidence-based framework. The main intuitive idea is that confidences are generated for experts based on how good they have looked in the past/how bad regret has been with respect to them so far (Step 7 of Algorithm 1), and the confidence is multiplied by the probabilities generated by the actual expert algorithm *with a static regret guarantee*. The algorithm and analysis provides a conceptual look into how switching regret minimization can be achieved and is interesting. Proof (of Theorem 3) appears essentially correct. These ideas also yield new results for the sparse bandit version of the problem, which has seen recent progress.